Goto

Collaborating Authors

 Manatee County


Integrative Decoding: Improve Factuality via Implicit Self-consistency

arXiv.org Artificial Intelligence

Self-consistency-based approaches, which involve repeatedly sampling multiple outputs and selecting the most consistent one as the final response, prove to be remarkably effective in improving the factual accuracy of large language models. Nonetheless, existing methods usually have strict constraints on the task format, largely limiting their applicability. In this paper, we present Integrative Decoding (ID), to unlock the potential of self-consistency in open-ended generation tasks. ID operates by constructing a set of inputs, each prepended with a previously sampled response, and then processes them concurrently, with the next token being selected by aggregating of all their corresponding predictions at each decoding step. In essence, this simple approach implicitly incorporates self-consistency in the decoding objective. Extensive evaluation shows that ID consistently enhances factuality over a wide range of language models, with substantial improvements on the TruthfulQA (+11.2%), Biographies (+15.4%) and LongFact (+8.5%) benchmarks. The performance gains amplify progressively as the number of sampled responses increases, indicating the potential of ID to scale up with repeated sampling.


Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning

arXiv.org Artificial Intelligence

People say, "A picture is worth a thousand words". Then how can we get the rich information out of the image? We argue that by using visual clues to bridge large pretrained vision foundation models and language models, we can do so without any extra cross-modal training. Thanks to the strong zero-shot capability of foundation models, we start by constructing a rich semantic representation of the image (e.g., image tags, object attributes / locations, captions) as a structured textual prompt, called visual clues, using a vision foundation model. Based on visual clues, we use large language model to produce a series of comprehensive descriptions for the visual content, which is then verified by the vision model again to select the candidate that aligns best with the image. We evaluate the quality of generated descriptions by quantitative and qualitative measurement. The results demonstrate the effectiveness of such a structured semantic representation.


NewsStories: Illustrating articles with visual summaries

arXiv.org Artificial Intelligence

Recent self-supervised approaches have used large-scale image-text datasets to learn powerful representations that transfer to many tasks without finetuning. These methods often assume that there is one-to-one correspondence between its images and their (short) captions. However, many tasks require reasoning about multiple images and long text narratives, such as describing news articles with visual summaries. Thus, we explore a novel setting where the goal is to learn a self-supervised visual-language representation that is robust to varying text length and the number of images. In addition, unlike prior work which assumed captions have a literal relation to the image, we assume images only contain loose illustrative correspondence with the text. To explore this problem, we introduce a large-scale multimodal dataset containing over 31M articles, 22M images and 1M videos. We show that state-of-the-art image-text alignment methods are not robust to longer narratives with multiple images. Finally, we introduce an intuitive baseline that outperforms these methods on zero-shot image-set retrieval by 10% on the GoodNews dataset.


Macro-Average: Rare Types Are Important Too

arXiv.org Artificial Intelligence

While traditional corpus-level evaluation metrics for machine translation (MT) correlate well with fluency, they struggle to reflect adequacy. Model-based MT metrics trained on segment-level human judgments have emerged as an attractive replacement due to strong correlation results. These models, however, require potentially expensive re-training for new domains and languages. Furthermore, their decisions are inherently non-transparent and appear to reflect unwelcome biases. We explore the simple type-based classifier metric, MacroF1, and study its applicability to MT evaluation. We find that MacroF1 is competitive on direct assessment, and outperforms others in indicating downstream cross-lingual information retrieval task performance. Further, we show that MacroF1 can be used to effectively compare supervised and unsupervised neural machine translation, and reveal significant qualitative differences in the methods' outputs.


Hurricane Irma Damage In Florida Shown In Drone Video

International Business Times

New drone video out of Florida captured an aerial view of the devastation wrought by Hurricane Irma in the Sunshine State. The video, taken by Travis Long and posted by the Miami Herald Wednesday, showed Irma's path of destruction in Manatee County, south of Tampa on the west coast. The video showed enormous trees ripped out of the ground by their roots, roofs torn clean off homes and overturned and sunken boats. At least one person could be seen in the video working to restore a home amid the wreckage. President Donald Trump headed down to Florida Thursday to determine the extent of the damage left by the record-breaking hurricane.


The future of fast food: KFC opens restaurant run by AI ROBOTS in Shanghai

Daily Mail - Science & tech

For over 60 years, KFC restaurants have been serving the same secret original recipe to patrons. But Colonel Sanders is going against tradition with a new concept store located in Shanghai, China that lets customers order fried chicken from a voice activated robot. Dubbed'Dumi', the robot is smart enough to handle order changes and substitutes, but its creators say it cannot distinguish other dialects or accents. Colonel Sanders is going against tradition with a new concept store located in Shanghai, China that lets customers order fried chicken from a voice activated robot. Dubbed'Dumi', the high-tech automation can handle order changes, as well as substitutes – but it cannot distinguish other dialects or accents Dumi is a voice activated robot employed at a concept KFC store called'Original '.